A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

نویسندگان

  • Martin Genzel
  • Gitta Kutyniok
چکیده

In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs {(xi , yi)}1≤i≤m. The observations yi ∈ R are thereby supposed to follow a noisy single-index model, depending on a certain set of signal variables. A major difficulty is that these variables usually cannot be observed directly, but rather arise as hidden factors in the actual data vectors xi ∈ Rd (feature variables). We will prove that a successful variable selection is still possible in this setup, even when the applied estimator does not have any knowledge of the underlying model parameters and only takes the “raw” samples {(xi , yi)}1≤i≤m as input. The model assumptions of our results will be fairly general, allowing for non-linear observations, arbitrary convex signal structures as well as strictly convex loss functions. This is particularly appealing for practical purposes, since in many applications, already standard methods, e.g., the Lasso or logistic regression, yield surprisingly good outcomes. Apart from a general discussion of the practical scope of our theoretical findings, we will also derive a rigorous guarantee for a specific real-world problem, namely sparse feature extraction from (proteomics-based) mass spectrometry data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

A new framework for high-technology project evaluation and project portfolio selection based on Pythagorean fuzzy WASPAS, MOORA and mathematical modeling

High-technology projects are known as tools that help achieving productive forces through scientific and technological knowledge. These knowledge-based projects are associated with high levels of risks and returns. The process of high-technology project and project portfolio selection has technical complexities and uncertainties. This paper presents a novel two-parted method of high-technology ...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

Stock Market Modeling Using Artificial Neural Network and Comparison with Classical Linear Models

Stock market plays an important role in the world economy. Stock market customers are interested in predicting the stock market general index price, since their income depends on this financial factor; Therefore, a reliable forecast in stock market can be extremely profitable for stockholders. Stock market prediction for financial markets has been one of the main challenges in forecasting finan...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1608.08852  شماره 

صفحات  -

تاریخ انتشار 2016